Whisper Speech
- Developed by
- Collabora and LAION
- Model type
- Multilingual Diffusion model for Speech Synthesis
- Task
- Text to Speech
- Model description
- An Open Source text-to-speech system built by inverting Whisper. Previously known as spear-tts-pytorch.
- An easy way to test voice-cloning.
- A Stable Diffusion fashion but for speech.
- Built on top of powerful Open Source models: Whisper from OpenAI to generate semantic tokens and perform transcription, EnCodec from Meta for acoustic modeling and Vocos from Charactr Inc as the high-quality vocoder.